Learning string distance with smoothing for OCR spelling correction
نویسندگان
چکیده
منابع مشابه
A Comparison of Four Character-Level String-to-String Translation Models for (OCR) Spelling Error Correction
We consider the isolated spelling error correction problem as a specific subproblem of the more general string-to-string translation problem. In this context, we investigate four general string-to-string transformationmodels that have been suggested in recent years and apply them within the spelling error correction paradigm. In particular, we investigate how a simple ‘k-best decoding plus dict...
متن کاملContext-Based Spelling Correction for Japanese OCR
We present a novel spelling correction method ['or those languages that have no delimiter between words, such ~rs ,lap;mese, (.',hinese, ,~nd ThM. It consists of an al)proximate word matching method and an N-best word seg mental|on Mgorithm using a statistical la.nguage model. For OCR errors, the proposed word-based correction method outperf.ornrs the conventional charactm'-b`ased correction me...
متن کاملStatistical Learning for OCR Text Correction
The accuracy of Optical Character Recognition (OCR) is crucial to the success of subsequent applications used in text analyzing pipeline. Recent models of OCR post-processing significantly improve the quality of OCR-generated text, but are still prone to suggest correction candidates from limited observations while insufficiently accounting for the characteristics of OCR errors. In this paper, ...
متن کاملApproximate string matching algorithms for limited-vocabulary OCR output correction
Five methods for matching words mistranslated by optical character recognition to their most likely match in a reference dictionary were tested on data from the archives of the National Library of Medicine. The methods, including an adaptation of the cross correlation algorithm, the generic edit distance algorithm, the edit distance algorithm with a probabilistic substitution matrix, Bayesian a...
متن کاملConceptual Distance and Automatic Spelling Correction
A BSTRACT. Text from different sources usually arrives under imperfect conditions. When an anomalous word is detected automatic word recognisers produce a list of candidates from which only one is correct. A variety of techniques have been devised to discriminate among the possible correction candidates. The project we are involved in tries to exploit linguistic knowledge in Spelling Correction...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Multimedia Tools and Applications
سال: 2016
ISSN: 1380-7501,1573-7721
DOI: 10.1007/s11042-016-4185-5